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ABSTRACT 


Straightforward application of the Schmidt- Appleman contrail formation criteria 
to diagnose persistent contrail occurrence from numerical weather prediction data is 
hindered by significant bias errors in the upper tropospheric humidity. Logistic models 
of contrail occurrence have been proposed to overcome this problem, but basic questions 
remain about how random measurement error may affect their accuracy. A set of 5000 
synthetic contrail observations is created to study the effects of random error in these 
probabilistic models. The simulated observations are based on distributions of 
temperature, humidity, and vertical velocity derived from Advanced Regional Prediction 
System (ARPS) weather analyses. The logistic models created from the simulated 
observations were evaluated using two common statistical measures of model accuracy, 
the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the 
probabilistic results of the logistic models into a dichotomous yes/no choice suitable for 
the statistical measures, two critical probability thresholds are considered. The HKD 
scores are higher when the climatological frequency of contrail occurrence is used as the 
critical threshold, while the PC scores are higher when the critical probability threshold is 
0.5. For both thresholds, typical random errors in temperature, relative humidity, and 
vertical velocity are found to be small enough to allow for accurate logistic models of 
contrail occurrence. The accuracy of the models developed from synthetic data is over 
85 percent for both the prediction of contrail occurrence and non-occurrence, although in 
practice, larger errors would be anticipated. 
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1. Introduction 


Contrail- induced cloud cover could be a significant factor in regional climate 
change over the United States of America (Minnis et al., 2004). As air traffic increases, 
the potential for globally significant impacts also rises. To better understand and predict 
these potential climatic effects, it is necessary to develop models that can accurately 
represent contrail properties based on ambient atmospheric variables including 
temperature, relative humidity and winds. 

Several high-resolution numerical weather analyses (NWA) including the 20-km 
Rapid Update Cycle (RUC; Benjamin et al., 2004) and the University of Oklahoma 
Center for Analysis and Prediction of Storms (CAPS) Advanced Regional Prediction 
System (ARPS; Xue et al., 2003) can provide the temperature, humidity and wind 
information necessary to diagnose contrail formation and persistence at time and space 
scales close to those of observed contrails. One outstanding problem that must be 
addressed to achieve a realistic simulation of contrails is the uncertainty in upper 
tropospheric relative humidity (UTH) in numerical weather analyses. Current numerical 
weather analyses tend to underestimate UTH due to dry biases in the balloon soundings 
used to construct the analyses (e.g., Minnis et al., 2005). Numerical weather prediction 
models are usually built for the prediction of storms and precipitation, and the accurate 
prediction of UTH is of secondary importance. This underestimation of humidity makes 
the straightforward calculation of contrail formation via the classical Schmidt- Appleman 
(Schumann, 1996) thermodynamic criteria, at best, difficult. In addition, numerical 
weather models are modified periodically, leading to changes in the way meteorological 
variables are computed in the model. The contrail forecast model therefore must also be 
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modified to reflect these changes, but in an objective and consistent manner. An 
additional problem in using numerical weather analyses is that, while their humidity 
fields appear to correlate with the location of persistent contrail coverage, the agreement 
is not exact. Nevertheless, there is some relationship between the structure of the NWA 
humidity fields and the longevity, spreading rate and optical depth of the observed 
contrails. The results from previous studies (e.g., Duda et al., 2004) show that the 
thickest, longest-lasting trails tend to occur in the moistest areas of the NWA. 

To deal with these problems, weather forecasters have used statistically processed 
numerical weather model data to make probabilistic forecasts for many years. One of the 
earliest models reported in the literature was developed by Lund (1955), and the model 
output statistics (MOS) method (Glahn and Lowry, 1972) provided some of the first 
widely used probabilistic forecasts developed from numerical weather forecasts. By 
using a statistical technique such as logistic regression, forecasts of the occurrence or 
non-occurrence of a weather-related event can be derived from the meteorological 
analyses and forecasts provided by operational numerical weather prediction (NWP) 
models. Assuming that the NWP models assimilate data consistently, logistic regression 
can obtain relationships between contrail occurrence and meteorological variables 
without requiring error-free data (which is necessary for the Schmidt-Appleman criteria). 
Logistic regression techniques also provide an objective method to deal with any 
necessary changes due to the reformulation of the NWP model. 

Probabilistic forecasting has already been applied to the contrail formation 
problem. Travis et al. (1997) used a combination of rawinsonde temperature and GOES 
(Geostationary Operational Environmental Satellite) 6.7-pm water vapor absorption data 
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to develop a logistic model to predict the occurrence of widespread persistent contrail 
coverage. Jackson et al. (2001) created a statistical contrail prediction model using 
surface observations and rawinsonde measurements of temperature, humidity and winds. 

Despite the success of these probabilistic forecast models, some questions remain 
about the usefulness of logistic models. Most importantly, neither study attempted to 
determine the potential impacts of random measurement error on the quality of the 
forecasts. In this paper, we assess the ability of logistic models to provide a valuable and 
accurate diagnosis/prediction of persistent contrail occurrence via numerical weather 
models under typical random errors expected in meteorological measurements. 

The next section briefly reviews classical contrail formation theory and its 
limitations, while section 3 introduces the logistic regression technique used to create the 
probabilistic model. A set of probabilistic persistent contrail occurrence forecasts is then 
created from examples of synthetic meteorological data based on operational numerical 
weather analyses, and the effects of random error in the meteorological variables are 
studied in section 4. The final section briefly summarizes and discusses the results. 

2. Brief overview of contrail formation theory 

Many contrail-forecasting techniques rely on Schmidt- Appleman theory to 
determine the meteorological conditions necessary for persistent contrail formation. This 
theory is described in detail by Schumann (1996); only a brief description is provided 
here. 

Schmidt- Appleman theory computes a theoretical critical temperature T c at which 
the mixture of aircraft engine exhaust and the ambient air reaches saturation with respect 
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to water. The critical temperature is a function of the ambient temperature and the fuel 
combustion efficiency of the aircraft. Schmidt- Appleman theory assumes that the aircraft 
exhaust and ambient air mix adiabatically and isobarically. If the heat and moisture 
within this aircraft plume mix similarly, the mixing can be described on a vapor pressure 
versus temperature diagram as a straight line. The slope of this mixing line is determined 
by the fuel combustion efficiency of the aircraft. Using this mixing line, T c can be found 
either graphically (Appleman, 1953) or numerically (Schrader, 1997) by matching the 
slope of the line with the derivative of the saturation vapor pressure curve with respect to 
temperature on the vapor pressure/temperature diagram. If the ambient vapor pressure is 
greater than or equal to the saturation vapor pressure with respect to ice, a persistent 
contrail will form for temperatures less than or equal to the points along the appropriate 
mixing line. Therefore, for constant aircraft propulsion efficiency, persistent contrail 
formation at a particular pressure level is ostensibly determined by the ambient 
temperature and humidity only. In the context of an operational contrail forecast where 
the resolution of the temperature and humidity data are on the order of tens to hundreds 
of kilometers, temperature and humidity are not precisely known. To determine the 
occurrence or non-occurrence of persistent contrails from Schmidt- Appleman theory, 
accurate and consistent meteorological data are required. This requirement limits the 
accuracy of contrail prediction models based strictly on the Schmidt- Appleman criteria. 
Meteorological data are subject to bias and random measurement errors that must be 
corrected before the Schmidt- Appleman theory can be applied successfully. 

Another factor complicating the prediction of persistent contrail occurrence is that 
other variables (including vertical velocity and the atmospheric lapse rate) may affect the 
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formation and the development of persistent contrails. Duda et al. (2008) matched 
several months of contrail coverage statistics derived from surface and satellite 
observations to a number of meteorological variables (including upper tropospheric 
humidity, vertical velocity, wind shear and atmospheric stability) in two operational 
numerical weather analyses. The relationships between contrail occurrence and the 
NWA-derived statistics were analyzed to determine under which atmospheric conditions 
persistent contrail formation is favored within NWAs. Humidity is the most important 
factor determining whether contrails are short-lived or persistent, and persistent spreading 
contrails are more likely to appear when vertical velocities are positive, and when the 
atmosphere is less stable. Because Schmidt-Appleman theory only deals with the 
formation of contrails, and not the development of persistent contrails, these factors are 
not considered in models based on the Schmidt-Appleman criteria. 

To overcome these limitations, probabilistic models using logistic regression have 
been developed. Not only can logistic models include an arbitrary number of atmospheric 
variables related to the occurrence of persistent contrails, the logistic model was 
considered in this study because it can handle the effects of a consistent, systematic bias 
error effectively. For example, if all relative humidity measurements used to create a 
logistic model of persistent contrail occurrence were reduced in magnitude by 1 5 percent, 
the probabilistic model developed from the modified data would be as accurate as the 
model developed from the original data. It is not as clear, however, how random error 
would impact the logistic model. In the next section, we develop a test model using 
synthetic meteorological data to determine how much random error affects the ability of 
logistic models to forecast persistent contrail occurrence. 
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3. Development of logistic models using synthetic data 


Logistic models are an effective method to build probabilistic forecasts. Unlike 
the Schmidt- Appleman criteria, logistic models are not affected by a consistent 
temperature or humidity bias in the observations used to develop them. We will examine 
a logistic model developed using synthetic meteorological data with perfectly known 
random variances, and use this model to estimate the effects of random error in the 
NWAs on logistic models. 
a. Statistical technique 

Logistic regression (Hosmer and Lemeshow, 1989) can be used to create a 
probabilistic estimate of persistent contrail formation. Logistic regression techniques are 
commonly used where the predictand, such as in this case, is a dichotomous (yes/no) 
variable. Although multiple linear regression can also be used to make probabilistic 
forecasts (e.g., Glahn and Lowry, 1972), logistic regression offers two advantages over 
linear regression. In logistic regression the forecast values cannot fall outside of the 0 - 1 
probability range, and each predictor can be fit in a nonlinearly way to the predictand. 

The logistic model assumes the following fit: 

l + exp[-(j 3 0 +/3 1 x 1 + --- + P p x p )\‘ ^ 

where P is the predictand (probability of persistent contrail occurrence) and (3, (for i = 

1 ,...,/?) are the set of coefficients used to fit the predictors (xi) to the model. All 
predictors used in this study are based on meteorological quantities in the upper 
troposphere that are assumed to be related physically to the formation of spreading, 
persistent contrails. Initially, we consider two variables that come directly from Schmidt- 
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Appleman theory (humidity and temperature). Another variable (vertical velocity) will 
also be considered for the purpose of examining how the addition of other factors might 
affect the accuracy of the logistic model. 

The maximum likelihood method was used to estimate the unknown coefficients 
P, and to fit the logistic regression model to the data. The chi-square statistic (x ) was 
used to assess the goodness of fit of each logistic model to the meteorological data. To 
reduce the number of predictors to an optimal number, a stepwise regression technique is 
used. In each step of the technique, a new predictor is added to the logistic model and the 
chi-square statistic is compared with the previous model. The new predictor that 
produces the largest improvement in model fit (that is, the largest increase in x ) is added 
to the model. To avoid overfitting of the model, the stepwise regression technique is 
allowed to add predictors to the model until the test for statistical significance reaches a 
significance level (i.e., p-value) of 0.05. 
b. Sample meteorological data 

To build the test model, atmospheric profiles of temperature, humidity, and 
vertical velocity were derived from the 27-km horizontal resolution ARPS in 25-hPa 
intervals from 400 to 150 hPa. The ARPS data were obtained from the hourly contiguous 
United States (CONUS) domain analyses. Due to computing limitations, the ARPS data 
were stored at approximately l°x 1° resolution. Atmospheric humidity expressed in the 
form of relative humidity with respect to ice (RHI) was computed from the ARPS fields 
of potential temperature and specific humidity. 


9 



c. Synthetic meteorological data 

To test the logistic regression technique, a simple set of synthetic meteorological 
data and contrail observations were created based on the ARPS meteorological datasets 
and on Schmidt-Appleman theory. First, distributions of ARPS 250 hPa relative 
humidity with respect to ice (RHI), temperature (TMP), and vertical velocity (W) data 
were created by selecting 176 days of data uniformly throughout 2 years (April 2004 to 
March 2006) of ARPS hourly analyses. Each distribution contains over 7.5 million 
individual data points throughout the ARPS model domain across the CONUS and 
surrounding oceans. These distributions are represented as solid lines in the graphs in 
Figure 1. The relative humidity with respect to ice is distributed more or less uniformly. 
The temperature distribution is somewhat skewed due to the changing temperature 
patterns throughout the year, but during short time periods (one or two days) the ARPS 
250 hPa temperature distribution is almost normally distributed. Figure 1 shows the 
ARPS temperature distribution for 4 - 5 Feb 2006 as a dotted line. The vertical velocity 
distribution is distributed nearly equally about 0 cm s- 1 , and can be approximated by a 
logistic distribution. The logistic distribution can be rewritten as: 

f ( x ) = 7- sech2 (- ) (2) 

4s \ 2s 

where p is the mean of the distribution and s is a shape factor determining the width of 
the distribution. 

Next, a set of synthetic 250 hPa meteorological data was created to approximate 
the ARPS data. For the humidity data, a random uniform distribution from 5 percent to 
125 percent was used. This humidity distribution is similar in form to the distribution 
used by Buehler and Courcoux (2003) based on radiosonde data. The humidity 
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distribution was made slightly moister than the ARPS distribution to offset the suspected 
dry bias in the ARPS model (and to increase the overall persistent contrail occurrence 
rate), but this change in the distribution is not expected to affect the overall conclusions 
of this study. The synthetic temperature distribution is a random normal distribution with 
a mean of 223 K and standard deviation of 5 K. The synthetic distribution roughly 
approximates a typical ARPS temperature distribution during January. The vertical 
velocity distribution was approximated by using a random logistic distribution with p = 0 
cm s- and s = 1.25 cm s- . A total of 5000 simulated 250-hPa observations were 
produced for each of the three meteorological variables, and the resulting distributions 
are shown in Figure 1 as dashed lines. 

Finally, persistent contrail occurrences for two scenarios (A and B) were 
determined for each simulated observation using two sets of contrail formation criteria. 

In scenario A, persistent contrail formation occurred when the RHI was 100 percent or 
greater, and the temperature was less than or equal to 226.6 K, which is the critical 
temperature for contrail formation at 250 hPa when RHI =100 percent and the aircraft 
fuel combustion efficiency is 0.4. Scenario A represents persistent contrail formation 
simply in terms of Schmidt- Appleman contrail formation theory and assumes only 
temperature and humidity influence contrail formation. Because it is expected that other 
meteorological factors affect the development of persistent contrails, scenario B allows 
for the effects of vertical velocity on contrail occurrence. Vertical velocity was selected 
because it is known to affect the occurrence of persistent contrails. Duda et al. (2008) 
showed that surface observations of contrail occurrence appeared to be more likely in 
regions with rising motion in the upper troposphere, and Duda et al. (2004) reported that 


11 



sinking motions of 1.5 cm s- 1 in the upper troposphere correlated with the suppression of 
persistent contrail occurrence in satellite imagery. In scenario B, an adjusted relative 
humidity is computed in percent from 

RHI^ = RHI + 5x W(in cm s J ). (3) 

Contrail occurrence is then determined using the same temperature and humidity 
criteria as in scenario A (of course substituting RHI a dj for RHI). Thus, rising motion 
would increase the likelihood of contrail occurrence, and sinking motion would decrease 
the likelihood of occurrence. Although this formula is arbitrary and was developed solely 
to demonstrate the possible effects of vertical velocity in contrail forecasting, it is well 
known that rising vertical motion can directly affect humidity by adiabatic cooling. From 
elementary thermodynamic theory (Rogers, 1979), in a well-mixed layer, the change in 
humidity with height when RH = 70 percent and T = 225 K is 6.6% per 100 m. Thus, 
lifting a parcel 76 m would produce a 5 percent increase in humidity, and would require 
approximately 2 hours for a vertical velocity of 1 cm s- 1 . 
d. Predictors and skill scores 

In addition to the three synthetic data variables, 1 9 other predictors were selected 
to develop the test case contrail prediction models (Table 1). Five additional predictors 
are uniformly distributed random variables that have no relation to the predictand, and 
four more are a product of a synthetic data variable and an unrelated random variable. 
These variables are included to test the ability of the regression method to accept or reject 
data that are known to be unrelated to the predictand. Another six predictors are the 
products of one or more of the three synthetic data variables, while the remaining four 
variables are more complicated combinations of vertical velocity and another synthetic 
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meteorological variable. In particular, variable R5V (RHI + 5x W) reflects the adjusted 
RHI used in scenario B. 

Two groups of statistical contrail models (scenarios A and B) then were derived 
from the database of 5000 synthetic contrail observations and the 19 selected predictors. 
For simplicity, both sets of models are fit to all 5000 observations, and the results are 
verified using the same 5000 observations. To determine the accuracy of the contrail 
models, two statistical measures were employed. Both of these measures have been used 
to quantity the accuracy of previous categorical (i.e., yes/no, occurrence/non-occurrence) 
contrail formation forecasts (Jackson et al., 2001; Walters et al., 2000). The contrail 
formation forecasts are separated into four categories based on the forecast and its 
outcome: a is the number of cases where persistent contrail formation is forecasted, and 
persistent contrails are observed (hits); b is the number of cases where contrails are 
predicted, but no contrails are observed (false alarms); c is the number of cases where 
contrails are not forecasted, but contrails are observed (misses), and d is the number of 
cases where contrails are not forecasted and no contrails are observed (correct rejections). 
The first measure is the percent correct (PC), and is calculated as (a + d)/(a + b + c + d). 
The percent correct represents the percentage of forecasts in which the method correctly 
predicted the observed event. The second variable is known as the Hanssen-Kuipers 
discriminant or the true skill statistic (HKD) (Wilks, 1995). The HKD is calculated as 
(ad - be) / [(a + c)(b + d)]. This measure of forecasting skill can also be interpreted as 
(accuracy for events) - (accuracy for non-events) - 1 , and measures the skill of the “yes” 
and “no” forecasts of contrail occurrence equally, regardless of the relative numbers of 
each forecast. Although in cases where the forecasted event is rare (such as contrail 
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occurrence) HKD might be viewed as unduly rewarding “yes” forecasts, Gandin and 
Murphy (1992) show that HKD is the only equitable skill score for a two-event (i.e., yes- 
or-no) forecast. Equitable skill scores require that constant forecasts of a particular event 
are not favored over constant forecasts of other events (in this case, the “no” forecast 
should not be favored because persistent contrails rarely form, and thus a “no” forecast 
would most likely to be the correct forecast). 

The logistic regression provides a probability of occurrence for an event between 
0 and 1, but the skill scores rely on a dichotomous yes/no (persistent contrail 
occurs/persistent contrail does not occur) choice. What is the appropriate probability 
threshold to discriminate between “yes” and “no”? Jackson et al. (2001) predicted 
contrails when the probability was 0.5 or more, and predicted no contrail when the 
probability was less than 0.5. Gandin and Murphy (1992) argue that the critical threshold 
for translating probabilistic forecasts into categorical forecasts in the two-event situation 
is the climatological mean probability of the event. In the case of Jackson et al. (2001), 
the climatological mean probability of contrail occurrence (either persistent or non- 
persistent) was near 0.5 (0.64), but the occurrence of persistent contrails is a relatively 
rare event, and the choice of threshold is pertinent. In this study we test the effects of 
both thresholds on contrail forecast model accuracy. 
e. Random error 

As mentioned earlier, the logistic model was considered in this study because it 
can handle the effects of a consistent, systematic bias error effectively. The effects of 
random error on the model, however, are not as clear. To study the impact of random 
error on the logistic model, various levels of normally distributed random error were 
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added to the database of 5000 synthetic observations. Table 2 presents the different 
random errors used in the simulations. The random errors are expressed in terms of the 
standard deviation of the added random error. Each of the contrail models developed in 
this section is named using the following convention. Models developed using the 
climatological mean probability as the critical threshold are designated as Alx or Blx, 
while models using 0.5 as the threshold are called A2x or B2x, where x is the random 
error label described in Table 2, and A and B refer to the contrail formation criteria used 
to determine contrail occurrence. Note that although each logistic model is created using 
perturbed meteorological data (except cases Ala, A2a, Bla and B2a), the forecasts of 
contrail occurrence from those models are always compared to the same set of contrail 
occurrences that is based on the original, unperturbed data. 

Although the random errors chosen for this study are intended to demonstrate the 
effect of the error on the logistic model, the actual expected magnitude of the 
meteorological errors is not certain. The values chosen for this study are based on 
previous estimates. Walters et al. (2000) estimate uncertainties in temperature of ±2 K 
resulting from measurement errors by radiosonde and spatial and temporal differences 
between the radiosonde measurement and the contrail observation, and relative humidity 
errors of -7.5 percent due to a systematic bias in radiosonde measurements. Gettelman et 
al. (2006) report a comparison of Atmospheric Infrared Sounder (AIRS) data with in situ 
aircraft measurements of temperature and relative humidity. The standard deviation of 
the differences between AIRS and in situ data was 1 .5 K or less for temperature, and was 
9 percent for relative humidities at pressure levels below 250 hPa. The root-mean-square 
differences between upper tropospheric temperature and relative humidity computed in 
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the RUC analyses and radiosonde observations are 0.5 K and 8 percent, respectively, at 
300 hPa for the period between 1 1 September to 31 December 2002 (Benjamin et al., 
2004). Mapes et al. (2003) studied random errors in tropical rawinsonde-array budgets, 
and determined that the unresolved variability in such arrays is 0.5 K for temperature 
measurements, and 15 percent for relative humidity measurements in the middle-upper 
troposphere. The random error in computed vertical velocity resulting from errors in the 
vertical integration of wind divergence was estimated by Mapes et al. to be on the order 
of 4x 10 A hPa s- 1 , or approximately 1 cm s' 1 based on typical meteorological conditions at 
250 hPa. We expect that the values of random error in Table 2 are at least representative 
of the random errors likely to be present in the RUC/ ARPS data. Although the Mapes et 
al. study is based on tropical soundings, which probably have less variability than mid- 
latitude soundings where most persistent contrails occur, the RUC/ ARPS models benefit 
from finer spatial and temporal resolution than rawinsonde arrays. 

4. Results from synthetic data set 

The stepwise regression technique was applied to the original 5000 synthetic 
observations, and to the set of 12 perturbed observations containing the various levels of 
random error described in Table 2. Each contrail formation scenario therefore produced 
1 3 logistic models, and probability forecasts for each model were converted into 2 sets of 
yes/no persistent contrail occurrence “forecasts” based on the two critical probability 
thresholds. The skill scores computed for each contrail formation scenario are presented 
and discussed in the next two subsections. 
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a. Scenario A 


The temperature and relative humidity criteria described in section 3 c are the only 
variables that determine persistent contrail occurrence for scenario A. Although the 
stepwise regression technique would sometimes produce more than one (equally 
accurate) set of predictors for each of the 13 datasets, and the chosen groups of predictors 
sometimes varied between datasets, one group of predictors was most commonly chosen. 
For scenario A, the preferred set of predictors was RHI, TMP, TMP2, and RT. Table 3 
presents the skill scores for each of the 13 datasets and both sets of critical probability 
thresholds for forecasts based on these four predictors. The climatological occurrence 
rate is simply the overall occurrence rate of persistent contrails determined from the 
contrail formation criteria in scenario A applied to the original 5000 synthetic 
observations, and equals 0.1598. A comparison of scenarios A1 and A2 shows that the 
choice of 0.5 as the critical probability threshold increases PC but decreases HKD, 
because the occurrence of contrail persistence is relatively rare. The use of the critical 
probability threshold of 0.5 increases the number of “no” forecasts, which is the more 
likely event. Conversely, the HKD decreases because it tends to reward the prediction of 
rare events more than common events. Using the climatological occurrence rate tends to 
increase the number of “yes” forecasts and leads to an increase in the number of false 
alarms, but it also decreases the number of misses. 

The accuracy of the logistic models remains high regardless of the random error 
added to the synthetic meteorological data. Even in case m, the HKD for scenario A1 is 
0.735, and the accuracy of the yes and no forecasts is 89 and 85 percent respectively. 
Random errors in relative humidity tend to affect the accuracy of the scenario A logistic 
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models the most, and, of course, random errors in vertical velocity have no effect on 
model accuracy. 
b. Scenario B 

In scenario B, persistent contrail occurrence is controlled by temperature and a 
vertical velocity-adjusted relative humidity. Because the determination of contrail 
occurrence is more complicated in scenario B, the accuracy of the logistic models is 
slightly less overall than in scenario A. The best overall set of predictors for scenario B 
is TMP, TMP2, RT, RV, TV, T5V, and R10V. The skill scores for each of the models 
derived from these seven predictors are presented in Table 3. The PC range from 0.970 
for the error-free case Bla to 0.843 for case Blm with the largest random errors. The 
random errors in relative humidity tend to have the largest impact on the accuracy of the 
forecast models, and temperature errors have the smallest effect. 

A comparison of the skill scores from the 4-predictor models with the skill scores 
from the 7-predictor models shows that for scenario A the results are nearly identical. 

For Scenario B, the 7-predictor models have about 5 percent better (absolute) accuracy 
than the 4-predictor models when the random errors are small, and the models have 
nearly the same accuracy for the cases with the largest random errors. The influence of 
vertical velocity on the determination of contrail occurrence in this simulation is therefore 
minor, although the actual effects of vertical velocity on persistent contrail occurrence are 
not well known. 

Although not shown here, other sets of predictors were sometimes chosen by the 
stepwise regression technique as the best model. The skill scores from those predictor 
sets were similar to the presented results. Not surprisingly, the logistic regression method 
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nearly always chose some combination of relative humidity, temperature, and vertical 
velocity (for scenario B) as predictors. Rarely, one of the random variables was chosen 
as one of the predictors, but only for the cases with the largest random error. Thus, the 
logistic model was able to distinguish the proper predictors from a group of random 
variables, but sometimes variables such as R10V with subtle differences from the actual 
contrail occurrence selector were chosen ahead of the true selector (R5V). 

The results from this test case based on the synthetic meteorological data 
demonstrate that the logistic method can develop highly accurate contrail prediction 
models based on expected levels of random error in the meteorological data. We note, 
however, that these results represent a best-case scenario for the logistic regression 
technique. All of the factors that affect contrail occurrence are few and are well known, 
and all are included in the set of potential predictors. It is implicitly assumed that all of 
the synthetic observations occur within areas of air traffic, so that persistent contrails will 
occur if the conditions favor occurrence. Logistic models created using actual 
meteorological data and contrail occurrence observations are not expected to be as 
accurate. For a more complete assessment of contrail model accuracy, Duda and Minnis 
(2008, part II of this paper) show examples of logistic models developed from numerical 
weather model data and from actual contrail observations. 

5. Summary and concluding remarks 

Straightforward application of the contrail formation criteria from Schmidt- 
Appleman theory to diagnose persistent contrail occurrence is hindered by significant 
humidity errors within numerical weather prediction models. Logistic models of contrail 
occurrence have been proposed to overcome these problems, but basic questions remain 
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about their accuracy. To investigate logistic models, we created a set of 5000 synthetic 
contrail observations to study the effects of random error in meteorological variables on 
the development of these probabilistic models. The simulated observations are based on 
distributions of temperature, humidity, and vertical velocity derived from Advanced 
Regional Prediction System (ARPS) weather analyses. The logistic models created from 
the simulated observations were evaluated using two common statistical measures of 
model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). 
To convert the probabilistic results of the logistic models into a dichotomous yes/no 
choice suitable for the statistical measures, two critical probability thresholds are 
considered. The HKD scores are higher when the climatological frequency of contrail 
occurrence is used as the critical threshold, while the PC scores are higher when the 
critical probability threshold is 0.5. For both thresholds, typical random errors in 
temperature, relative humidity, and vertical velocity derived from comparison with 
radiosonde measurements are found to be small enough to allow for accurate logistic 
models of contrail occurrence. The accuracy of the models developed from synthetic 
data is over 85 percent for both the prediction of contrail occurrence and non-occurrence. 
In practice, larger errors would be anticipated because persistent contrails are expected to 
be influenced by additional atmospheric variables (and thus more uncertainty) than those 
presented in this study. 

Some unanswered issues about the effectiveness of the logistic model are not 
addressed here, and require future study. The synthetic dataset not only has perfectly 
known meteorological data, but the occurrence of contrails is also precisely known. The 
occurrence of contrails is not always known; cloud cover may obscure both surface and 
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satellite observations of contrails, and observations may not always be available for all 
times and locations. Also, aircraft may not fly at all times through some regions where 
persistent contrails are possible, although this is not expected to be a major problem for 
this study as most of the CONUS is nearly continuously traveled by jet aircraft 
throughout the day. The impacts of these factors on the determination of contrail 
occurrence by logistic models should be quantified. 

More work is needed to realize the potential of logistic contrail forecasts. The 
most direct way to make the logistic models better is to reduce the errors within the 
meteorological data used to build the models. Meteorological errors directly affect the 
regressions developed in the logistic model, and if the errors are large enough, may cause 
the model to choose less pertinent predictors, further reducing model accuracy. 
Meteorological analyses could be improved by using the Atmospheric Infrared Sounder 
(AIRS) onboard the Aqua satellite to supplement the temperature and relative humidity 
data in numerical weather models. Methods to reduce errors in the determination of 
contrail occurrence could also be pursued. Additional studies are needed to determine if 
other regionally or temporally averaged variables would increase the accuracy of logistic 
models based on numerical weather forecasts, and if other atmospheric variables may be 
relevant. Regional and seasonal models of contrail occurrence may help improve the 
overall performance of this type of persistent contrail prediction model. Finally, logistic 
models of contrail occurrence provide an additional advantage that has not been used 
here. Because logistic models compute a probability of occurrence, they could be useful 
in global circulation model (GCM) simulations of contrail coverage (Ponater et al., 2002; 
Marquart et al., 2003) to determine the impact of contrail radiative forcing on global 
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climate. Such models use a simple analytical formula based on relative humidity and 
cirrus cloud coverage to determine contrail coverage. The logistic models could be easily 
used within the GCM to determine an appropriate contrail coverage fraction for a region 
based upon the product of the air traffic and the computed probability. Because the 
logistic model could be developed by comparing GCM model simulations to actual 
contrail observations, it may provide more accurate simulations of contrail coverage than 
current methods. 
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List of Figures 

Fig. 1. (a) Normalized probability density functions of 250-hPa RHI computed from the 
ARPS model over model domain (solid line) and a 5000-point simulated distribution 
(dashed line) based on a random uniform distribution, (b) Normalized probability density 
functions of 250-hPa temperature computed from the ARPS model over an 1 8-month 
period (solid line), from the ARPS model over a two-day period in February 2006 (dotted 
line), and a 5000-point simulation based on a random normal distribution (dashed line), 
(c) Normalized probability density functions of 250-hPa vertical velocity computed from 
the ARPS model (solid line) and a 5000-point random logistic distribution (dashed line). 
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Table 1. Atmospheric parameters used as predictors in the logistic models. 


Number 

Parameter 

Name 

0 

250 hPa relative humidity with respect to ice 

RHI (in percent) 

1 

250 hPa temperature 

TMP (in K) 

2 

250 hPa vertical velocity 

W (in cm s- 1 ) 

3 

Lapse rate (uniform random variable from -10 to -6) 

LRT 

4 

Uniform random variable from -50 to +50 

RAND01 

5 

Uniform random variable from 0 to 100 

RAND02 

6 

Uniform random variable from -7 to +3 

RAND03 

7 

Uniform random variable from 0 to 10 

RAND04 

8 

RHIxRHI 

RHI2 

9 

TMPxTMP 

TMP2 

10 

RHIxTMP 

RT 

11 

RHIxVV 

RV 

12 

TMPxVV 

TV 

13 

WxW 

VV2 

14 

RHIxLRT 

RL 

15 

TMPxLRT 

TL 

16 

WxLRT 

VL 

17 

LRTxLRT 

LRT2 

18 

RHI + 5x W 

R5V 

19 

TMP + 5x W 

T5V 

20 

RHI+ lOxW 

R10V 

21 

TMP + lOxVV 

T10V 


27 






Table 2. Scenarios of normally distributed random error added to the synthetic 
meteorological measurements. The magnitude of the added random error is represented 
in each scenario in terms of the standard deviation of the error. 


Scenario Label 

TMP error (in K) 

RHI error (in percent) 

W error (in cm s- 1 ) 

a 

0 

0 

0 

b 

1 

0 

0 

c 

0 

5 

0 

d 

0 

0 

1 

e 

2 

0 

0 

f 

0 

10 

0 

g 

0 

0 

2 

h 

3 

0 

0 

i 

0 

15 

0 

j 

0 

0 

3 

k 

1 

5 

1 

1 

2 

10 

2 

m 

3 

15 

3 
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Table 3. Skill scores (PC/HKD) computed for each of the 13 synthetic meteorological 
datasets based on a set of 4 predictors or a set of 7 predictors. Each scenario represents a 


combination of critical probability threshold and contrail occurrence criteria. 


Predictors: RHI, TMP, TMP2, RT 
Label Scenario A1 Scenario A2 

Scenario B1 

Scenario B2 

a 

0.971/0.948 

0.982/0.928 

0.908/0.845 

0.935/0.736 

b 

0.965/0.940 

0.979/0.915 

0.905/0.840 

0.933/0.734 

c 

0.945/0.907 

0.962/0.850 

0.898/0.830 

0.930/0.720 

d 

0.971/0.948 

0.982/0.928 

0.908/0.845 

0.935/0.736 

e 

0.954/0.929 

0.970/0.884 

0.900/0.829 

0.926/0.702 

f 

0.910/0.850 

0.937/0.740 

0.880/0.794 

0.914/0.639 

g 

0.971/0.948 

0.982/0.928 

0.908/0.845 

0.935/0.736 

h 

0.942/0.912 

0.963/0.856 

0.889/0.811 

0.921/0.682 

i 

0.876/0.786 

0.915/0.634 

0.852/0.746 

0.899/0.556 

j 

0.971/0.948 

0.982/0.928 

0.908/0.845 

0.935/0.736 

k 

0.943/0.904 

0.957/0.831 

0.895/0.821 

0.924/0.693 

1 

0.897/0.822 

0.930/0.713 

0.868/0.770 

0.907/0.608 

m 

0.853/0.735 

0.906/0.589 

0.831/0.696 

0.888/0.496 


Label 

Scenario A1 

Scenario A2 

Scenario B1 

Scenario B2 

a 

0.971/0.948 

0.981/0.927 

0.970/0.952 

0.978/0.917 

b 

0.965/0.940 

0.978/0.908 

0.964/0.941 

0.975/0.901 

c 

0.945/0.906 

0.962/0.850 

0.949/0.921 

0.962/0.846 

d 

0.970/0.946 

0.981/0.928 

0.951/0.922 

0.967/0.867 

e 

0.954/0.928 

0.971/0.883 

0.951/0.924 

0.968/0.869 

f 

0.909/0.849 

0.939/0.744 

0.919/0.865 

0.941/0.754 

g 

0.970/0.947 

0.982/0.933 

0.933/0.886 

0.955/0.812 

h 

0.942/0.912 

0.963/0.851 

0.942/0.912 

0.962/0.845 

i 

0.876/0.784 

0.916/0.635 

0.883/0.789 

0.920/0.655 

j 

0.970/0.947 

0.982/0.932 

0.922/0.870 

0.947/0.786 

k 

0.943/0.905 

0.958/0.837 

0.928/0.879 

0.950/0.793 

1 

0.897/0.822 

0.930/0.710 

0.884/0.790 

0.920/0.659 

m 

0.853/0.737 

0.906/0.588 

0.843/0.711 

0.898/0.540 
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Fig. 1 . (a) Normalized probability density functions of 250-hPa RHI computed from the 
ARPS model over model domain (solid line) and a 5000-point simulated distribution 
(dashed line) based on a random uniform distribution, (b) Normalized probability density 
functions of 250-hPa temperature computed from the ARPS model over an 1 8-month 
period (solid line), from the ARPS model over a two-day period in February 2006 (dotted 
line), and a 5000-point simulation based on a random normal distribution (dashed line), 
(c) Normalized probability density functions of 250-hPa vertical velocity computed from 
the ARPS model (solid line) and a 5000-point random logistic distribution (dashed line). 
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